This displays the resulting filled images calculated using the fill_gaps.R script.
Different parameters were tested on the following data (note there are 2 different weeks, one with good weekly coverage and one without):
Region: Northwest Atlantic (NWA, 39 to 82 N, 42 to 95 W)
Sensor: MODIS
Resolution: 4km
Processing level: Level 3, binned (L3b)
Year: 2015
Weeks: 7, 22
Pixels outside 0-64 mg m^-3 removed
Days with < 5% coverage removed
ImputeEOF removes randomly sampled valid pixels for cross-validation. The number of pixels used is the maximum of 30, or 10% of the pixels. The function continues adding EOFs and calculating the resulting RMSE between real and reconstructed cross-validation pixels until the difference between the current RMSE and RMSE of the previous iteration is below a certain threshold (i.e. adding the most recent EOF did not significantly improve the RMSE). The threshold, called the “tolerance”, is different depending on whether you’re filling data in linear space or in log space, since a log RMSE will be only a fraction of the size of a linear RMSE:
Tolerance for filling logged data: 0.001
Tolerance for filling linear data: 0.01
We start by using a year of data to fill the gaps, and compare different methods below. Then, using the best options, we’ll try using a longer time series.
For each method of filling gaps, we’ll examine the following:
The linear regression uses the standard major axis method (SMA) from lmodel2::lmodel2(), since it minimizes the area of the triangle instead of the distance in the x or y direction alone (i.e. it assumes there is error in both the independent and dependent variables, the “real” and filled/reconstructed data).
Also note that for the tests that involve filling an 8day composite, in situ matchups should be interpreted with caution because of the long temporal bin and the changes that could occur in concentrations and patterns within that time span.
An analysis of DINEOF on the Canadian Pacific coast:
Hilborn A, Costa M. Applications of DINEOF to Satellite-Derived Chlorophyll-a from a Productive Coastal Region. Remote Sensing. 2018; 10(9):1449. https://doi.org/10.3390/rs10091449
Chla algorithm: OCx
Logged/linear data: Logged
Which is better - filling the gaps in 8day data, or filling gaps in daily data and then averaging it into an 8day image?
Although some R^2 metrics appear better for the daily filled version, overall the 8day cross-validation data has a better fit and less bias (e.g. it identifies some patterns of higher concentration better than the daily fill).
Number of EOF: 5
Total RMSE: 0.2224261
Week 7 RMSE: 0.287156
Week 22 RMSE: 0.1992027
Number of EOF: 11
Total RMSE: 0.2062114
Week 7 RMSE: 0.372335
Week 22 RMSE: 0.1860863
Temporal binning: 8day
Logged/linear data: Logged
Should the OCx or POLY4 algorithm be used? Note that POLY4 has shown to remove some of the bias in the NWA.
OCx = global band-ratio
POLY4 = regional band-ratio, tuned to NWA
The POLY4 algorithm does appear to remove some of the bias and improve the validity of the reconstructed values.
Number of EOF: 5
Total RMSE: 0.2224261
Week 7 RMSE: 0.287156
Week 22 RMSE: 0.1992027
Number of EOF: 6
Total RMSE: 0.257584
Week 7 RMSE: 0.3356319
Week 22 RMSE: 0.2588712
Temporal binning: 8day
Chla algorithm: POLY4
Should we use logged data or linear data to fill the gaps?
Note the process for the log option:
Logged data gives a smoother fill as it is not negatively impacted by isolated spikes over relatively low and consistent concentrations.
Number of EOF: 6
Total RMSE: 0.257584
Week 7 RMSE: 0.3356319
Week 22 RMSE: 0.2588712
Number of EOF: 5
Total RMSE: 1.806879
Week 7 RMSE: 2.12901
Week 22 RMSE: 1.298033
If more satellite images are used in the algorithm, will it improve the results?
Hilborn and Costa (2018) found that pixel reconstruction improved with more data in a smaller region on the Canadian Pacific coast. Up until this point we have only used one year of data to fill the gaps, but here we’ll try adding more (an equal number of years on either side of the target year, 2015).
Note that the 3year/5year DINEOF runs use the same cross-validation pixels for 2015 with extra randomly-selected pixels from the remaining years. Also, the CV regression below is performed using only the CV pixels for 2015 to give a more accurate comparison between methods.
Overall, expanding the time series seems to give a slight improvement to the results. Based on the RMSE summary plot at the bottom, it appears as though the best results are achieved when using ~ 3 years of data to fill the gaps, after which there are only very slight improvements to weeks with good percent coverage, and the RMSE for weeks with bad percent coverage starts rising.
Number of EOF: 6
Total RMSE: 0.257584
Week 7 RMSE: 0.3356319
Week 22 RMSE: 0.2588712
Number of EOF: 11
Total RMSE: 0.2314012
Week 7 RMSE: 0.2961003
Week 22 RMSE: 0.2433103
Number of EOF: 13
Total RMSE: 0.2248182
Week 7 RMSE: 0.3098051
Week 22 RMSE: 0.2395364
Number of EOF: 15
Total RMSE: 0.2219548
Week 7 RMSE: 0.3148155
Week 22 RMSE: 0.2332346
Number of EOFs for 1/3/5/7 years: 6/11/13/15
Here we’ll try adjusting the region used to fill the data.